Search Results: "johns"

30 March 2013

John Sullivan: Vegan in Amsterdam?

I'll be traveling to Amsterdam next week for a free software conference. Does anyone have recommendations for restaurants that are vegan-friendly? Natural food stores? I'll be staying very near the Central Station.

18 February 2013

John Sullivan: SCALE

I will be speaking at the Southern California Linux Expo (and yes, given the topics covered, it's missing a GNU). My talk, "Four Freedoms for Freedom," is on Sunday, February 24, 2013 from 16:30 to 17:30.
The most obvious people affected by all four of the freedoms that define free software are the programmers. They are the ones who will likely want to -- and are able to -- modify software running on their computers. But free software is a movement to advance and defend freedom for anyone and everyone using any computing device, not just programmers. In many countries now, given the ubiquity of tablets, phones, laptops and desktops, "anyone and everyone using any computing device" means nearly all citizens. But new technological innovations in these areas keep coming with new restrictions, frustrating and controlling users even while creating a perception of empowerment. The Free Software Foundation wants to gain the support and protect the interests of everyone, not just programmers. How do we reach people who have no intention of ever modifying a program, and how do we help them?
Other presentations on my list to check out (in chronological order, some conflicting): If you will be there and want to meet up, drop me a line.

28 January 2013

John Sullivan: FOSDEM

I'll be at FOSDEM again this year, arriving in Brussels on Thursday 31st and leaving on Tuesday 5th. I'll be speaking on Sunday in the legal issues devroom at 10:00. If you will be there and want to meet up, let me know. I may be trying to watch the Super Bowl from there, a plan that didn't quite work out last year but seems more likely this year. State of the GNUnion FSF licensing policy challenges in 2013 This talk will cover the main challenges facing the Free Software Foundation's Licensing and Compliance lab in 2013, and will invite discussion of the FSF's work and policies in this area. We'll explore:

12 January 2013

John Sullivan: Aaron Swartz

Aaron was an inspiration to me personally, politically, and professionally ever since we met (ice cream and word games with a small group in a bank vault at Herrell's in Harvard Square) several years ago. I don't understand how things got to this point, but I know I'm angry along with Lessig. I'm so sorry for all of his family and friends; all the rest of us can do is try to make even a tiny sliver of the difference he did.

11 December 2012

Russell Coker: Links December 2012

Steven Johnson gave an interesting TED talk about where good ideas come from [1]. He starts by attributing coffee replacing alcohol as a standard drink for some good ideas and then moves on to how ideas develop. Erez Lieberman Aiden and Jean-Baptiste Michel gave an interesting and amusing TED talk about the ngram analysis of books that Google scanned [2]. Here is the link for the Google Books ngram search [3]. Clay Shirky gave an insightful TED talk about how the Internet is changing the world [4]. He cites Open Source programmers as the modern day equivalent to the Invisible College based on our supposed great ability to get along with other people on the Internet. If we really are so much better than the rest of the Internet then things must be bad out there. He ends with ways of using Git to draft legislation. Hans Rosling gave an interesting TED talk about religion and the number of babies that women have [5]. His conclusion is that it s more about income and social stability and that the world s population can stabilise at 10 billion if we provide family planning to everyone. Alexis C. Madrigal wrote an interesting interview with Genevieve Bell about her work at Intel and the way people use technology [6]. Indigogo is raising funds for the Cuddle Mattress , it s a mattress with foam slats and a special fitted sheet to allow your arm to slide between the slats [7]. So you could have your arm underneath your SO for an extended period of time without risking nerve damage. They also show that when sleeping on your side your shoulder can go between the slats to avoid back problems. Nate Silver (who is famous for predicting US elections gave an interesting TED talk about racism and politics [8]. One of his main points is to show the correlation between racism and lack of contact of members of other races. Sociological Images has an interesting article by Lisa Wade about whether marriage is a universal human value [9]. In regard to historical marriage she says women were human property, equivalent to children, slaves, servants, and employees . The general trend in the comments seems to be that there are so many types of marriage that it s difficult to make any specific claims to traditional marriage unless you count a tradition of a short period in a single geographic region. Plurality is an excellent sci-fi short movie on youtube [10]. TED has an interesting interview with Hakeem Oluseyi about his research about astrophysics and how he achieved a successful career after being a gangster as a teenager [11]. He has some good ideas about helping other children from disadvantaged environments become successful. Paul Dwerryhouse wrote an interesting blog post about his work in designing and implementing a filesystem based on a Cassandra data store with FUSE [12]. Paul also wrote a post about using Apache Zookeeper to lock metadata to permit atomic operations [13]. The documentary Monumental Myths provides an interesting and insightful analysis of the processes of creating, maintaining, and explaining monuments [14]. It focusses on some significant monuments in the US and explains both sides to each story. Howard Zinn makes the insightful point that when people present a certain point of view of history it s not controversial, as soon as you present the other side they call it controversial . That happens even in debates about current issues. Howard also says to criticise whatever the government does is not anti-America, it s anti-government, it s pro-America, it s pro the people, it s pro the country . The song that plays during the closing credits is interesting too. The music video Same Love is one of the best presentations of the argument for marriage equality [15]. Chris Samuel wrote an interesting post about systems locked down for Windows 8 and options for purchasing PCs for running Linux [16]. His solution is to buy from ZaReason. I saw his laptop in action at the last LUV meeting and it looks really nice. Unfortunately a byproduct of the extremely thin form factor is the fact that it lacks a VGA port, this meant that Chris had to use my Thinkpad T61 (which is rather clunky by comparison) for his presentation.

26 November 2012

John Sullivan: Santiago de Compostela

I've put up my nearly unedited, unsorted, and uncommented photos from my recent trip to Santiago de Compostela, Spain, for the Libre Software World Conference. It was a beautiful place (and a great conference) -- I hope to write more about it soon.

15 October 2012

John Sullivan: Libre Software World Conference

I'm off to Spain to attend and speak at the Libre Software World Conference. If you're going to be there too, let me know.

11 August 2012

Russ Allbery: Review: Design Patterns

Review: Design Patterns, by Erich Gamma, et al.
Author: Erich Gamma
Author: Richard Helm
Author: Ralph Johnson
Author: John Vlissides
Publisher: Addison-Wesley
Copyright: 1995
Printing: September 1999
ISBN: 0-201-63361-2
Format: Hardcover
Pages: 374
Design Patterns: Elements of Reusable Object-Oriented Software by the so-called "Gang of Four" (Gamma, Helm, Johnson, and Vlissides) is one of the best-known books ever written about software design, and one of the most widely cited. The language introduced here, including the names of specific design patterns, is still in widespread use in the software field, particularly with object-oriented languages. I've had a copy for years, on the grounds that it's one of those books one should have a copy of, but only recently got around to reading it. The goal of this book is to identify patterns of design that are widely used, and widely useful, for designing object-oriented software. It's specific to the object-oriented model; while some of the patterns could be repurposed for writing OO-style programs in non-OO languages, they are about inheritance, encapsulation, and data hiding and make deep use the facilities of object-oriented design. The patterns are very general, aiming for a description that's more general than any specific domain. They're also high-level, describing techniques and methods for constructing a software system, not algorithms. You couldn't encapsulate the ideas here in a library and just use them; they're ideas about program structure that could be applied to any program with the relevant problem. With the benefit of seventeen years of hindsight, I think the primary impact of this book has been on communication within the field. The ideas in here are not new to this book. Every pattern in Design Patterns was already in use in the industry before it was published; the goal was taxonomy, not innovation. One would not come to Design Patterns to learn how to program, although most introductory texts on object-oriented programming now borrow much of the pattern terminology. Rather, Design Patterns is as influential as it is because it introduced a shared terminology and a rigor around that terminology, allowing writers and programmers to put a name to specific program structures and thus talk about them more clearly. This also allows one to take a step back and see a particular structure in multiple programs, compare and contrast how it's used, and draw some general conclusions about where it would be useful. I have the feeling that the authors originally hoped the book would serve as a toolbox, but I think it's instead become more of a dictionary. The pattern names standardized here are widely used even by people who have never read this book, but I doubt many people regularly turn to this book for ideas for how to structure programs. Design Patterns is divided into two parts: a general introduction to and definition of a software pattern followed by a case study, and then a catalog of patterns. The catalog is divided into creational patterns (patterns for creating objects), structural patterns (patterns for composing objects into larger structures), and behavioral patterns (patterns for interactions between objects). Each pattern in turn follows a very rigid presentation structure consisting of the pattern name and high-level classification, its basic intent, other common names, a scenario that motivates the pattern, comments on the applicability of the pattern, the structure and classes or objects that make up the pattern, how those participants collaborate, how the pattern achieves its goals, comments on implementation issues, sample code, known uses of the pattern in real-world software, and related patterns. As with a dictionary, the authors go to great lengths to keep the structure, terminology, and graphical representations uniform throughout, and the cross-referencing is comprehensive (to the point of mild redundancy). As for the patterns themselves, their success, both as terminology and as useful design elements, varies. Some have become part of the core lexicon of object-oriented programming (Factory Method, Builder, Singleton), sometimes to the point of becoming syntactic elements in modern OO languages (Iterator). These are terms that working programmers use daily. Others aren't quite as widespread, but are immediately recognizable as part of the core toolkit of object-oriented programming (Adapter, Decorator, Proxy, Observer, Strategy, Template Method). In some cases, the technique remains widespread, but the name hasn't caught on (Command, for example, which will be immediately familiar but which I rarely hear called by that name outside of specific uses inside UI toolkits due to ambiguity of terms). Other patterns are abstract enough that it felt like a bit of a reach to assign a name to them (Bridge, Composite, Facade), and I don't think use of those names is common, but the entries are still useful for definitional clarity and for comparing similar approaches with different implications. Only one pattern (Interpreter) struck me as insufficiently generic to warrant recording in a catalog of this type. So far, so good, but the obvious question arises: if you've not already read this book, should you read it? I think the answer to that is debatable. The largest problem with Design Patterns is that it's old. It came late enough in the development of object-oriented programming that it does capture much of the foundation, but OO design has continued to change and grow, and some patterns have either been developed subsequently or have become far more important. For example, Model-View-Controller is conspicuous by its absence, mentioned only in passing in the discussion of the Observer pattern. Any pattern catalog written today would have an extensive discussion. Similarly absent are Inversion of Control and Data Access Object, which are much more central to the day-to-day world of the modern programmer than, say, Memento or Visitor. One could easily go on: Lazy Initialization, Mock Object, Null Object... everyone will have their own list. A related problem is that all the implementation examples are shown in either C++ or Smalltalk (occasionally both). Those were probably the best languages to use at the time, but it's doubtful a modern rewrite would choose them. Smalltalk, in particular, I found nearly incomprehensible for the uninitiated, to the point where I ignored the code and only read the surrounding English description. C++ fares better, but Design Patterns occasionally drifts off into tedious discussions of how to work around C++'s limitations in ways that are irrelevant to the pattern itself and would not be necessary in, say, Java or Python. (This is ameliorated by the fact that C++, unlike Smalltalk, is still in widespread use, so those discussions remain moderately helpful for some readers.) Design Patterns is not, therefore, a very good source for a working knowledge of the most common patterns in use today. It has also become somewhat obsolete via its own success: the concept of a design pattern has become so popular that nearly all introductory texts include at least a basic discussion of design patterns and an introduction to the most notable and useful patterns. I think that's a more comfortable and more efficient way to pick up the basics than reading through this book, which is somewhat dense and which expects from the reader a reasonably good working knowledge of object-oriented programming. And, once you have the basics, MVC, DAO, and similar design patterns are probably more important than the more subtle design patterns presented here. That said, I think the rigor of description and the comparisons and discussions here still have some value. Design Patterns encourages the reader to look at patterns from a higher-level perspective, to think about meta-patterns such as the balance between data hiding and access, or between structure designed for the present purpose and structure that's adaptable to future needs. It's also mildly interesting from a historical standpoint; one can see the inspiration for future language designers in how problems are described here, and see how many of the implementation issues and negative consequences have been corrected or simplified by richer language designs. Overall, I would hesitate to recommend buying this book today, particularly at new textbook prices. But if you're a working object-oriented software designer or programmer, I think it's worth checking out from a library (and, thanks to its influence, any library with a decent software design section will almost certainly have a copy). Read the overall discussion, skim the catalog, and read the discussion of the patterns that strike your interest. It may help provide some additional structure and perspective to how you think about OO design. Rating: 6 out of 10

9 August 2012

John Sullivan: Tor UK goes DRM-free

UK Readers Can Now Purchase DRM-Free Books From Tor UK
We believe that making our Tor ebooks DRM-free is the best for our readers, allowing you to use legitimately-purchased ebooks in perfectly legal ways, like moving your library from one ereader to another, says Jeremy Trevathan, Publisher at Pan Macmillan. We understand that DRM can make your ebooks less easy to read. It also makes building and maintaining your digital library more complicated. For these reasons, we are committed to remaining DRM-free.
I'm really happy to see this and I hope more ebook sellers follow suit. Some of the others already doing DRM-free can be found on our list at Defective by Design. Also see StoryBundle, which just launched and offers a set of DRM-free ebooks on a pay-what-you-want basis, with some of the money going to charity.

30 July 2012

Johannes Schauer: port bootstrap build-ordering tool report 4

A copy of this post is sent to soc-coordination@lists.alioth.debian.org as well as to debian-bootstrap@lists.mister-muffin.de.

Diary

July 2
  • playing around with syntactic dependency graphs and how to use them to flatten dependencies

July 4
  • make work with dose 3.0.2
  • add linux-amd64 to source architectures
  • remove printing in build_compile_rounds
  • catch Not_found exception and print warning
  • use the whole installation set in crosseverything.ml instead of flattened dependencies
  • detect infinite loop and quit in crosseverything.ml
  • use globbing in _tags file
  • use wildcards and patsubst in makefile

July 5
  • throw a warning if there exist binary packages without source packages
  • add string_of_list and string_of_pkglist and adapt print_pkg_list and print_pkg_list_full to use them
  • fix and extend flatten_deps - now also tested with Debian Sid

July 6
  • do not exclude the crosscompiled packages from being compiled in crosseverything.ml
  • clean up basebuildsystem.ml, remove old code, use BootstrapCommon code
  • clean up basenocycles.ml, remove unused code and commented out code
  • add option to print statistics about the generated dependency graph
  • implement most_needed_fast_wrong as well as most_needed_slow_correct and make both available through the menu

July 7
  • allow to investigate all scc, not only the full graph and the scc containing the investigated package
  • handle Not_found in src_list_from_bin_list with warning message
  • handle the event of the whole archive actually being buildable
  • replace raise Failure with failwith
  • handle incorrectly typed package names
  • add first version of reduced_dist.ml to create a self-contained mini distribution out of a big one

July 8
  • add script to quickly check for binary packages without source package
  • make Debian Sid default in makefile
  • add *.d.byte files to .gitignore
  • README is helpful now
  • more pattern matching and recursiveness everywhere

July 9
  • fix termination condition of reduced_dist.ml
  • have precise as default ubuntu distribution
  • do not allow to investigate an already installable package

July 10
  • milestone: show all cycles in a graph
  • add copyright info (LGPL3+)

July 11
  • advice to use dose tools in README

July 16
  • write apt_pkg based python filter script replacing grep-dctrl

July 17
  • use Depsolver.listcheck more often
  • add dist_graph.ml
  • refactor dependency graph code into its own module

July 18
  • improve package selection for reduced_dist.ml
  • improve performance of cycle enumeration code

July 20
  • implement buildprofile support into dose3

July 22
  • let dist_graph.ml use commandline arguments

July 23
  • allow dose3 to generate source package lists without Build- Depends Conflicts -Indep

July 29
  • implement crosscompile support into dose3

Results

Readme There is not yet a writeup on how everything works and how all the pieces of the code work together but the current README file provides a short introduction on how to use the tools.
  • build and runtime dependencies
  • compile instructions
  • execution examples for each program
  • step by step guide how to analyze the dependency situation
  • explanation of general commandline options
A detailed writeup about the inner workings of everything will be part of a final documentation stage.

License All my code is now released under the terms of the LGPL either version 3, or (at your option) any later version. A special linking exception is made to the license which can be read at the top of the provided COPYING file. The exception is necessary because Ocaml links statically, which means that without that exception, the conditions of distribution would basically equal GPL3+.

reduced_dist.ml Especially the Debian archive is huge and one might want to work on a reduced selection of packages first. Having a smaller selection of the archive would be significantly faster and would also not add thousands of packages that are not important for an extended base system. I call a reduced distribution a set of source packages A and a set of binary packages B which fulfill the following three properties:
  • all source packages A must be buildable with only binary packages B being available
  • all binary packages B except for architecture:all packages must be buildable from source packages A
The set of binary packages B and source packages A can be retrieved using the reduced_dist program. It allows to either build the most minimal reduced distribution or one that includes a certain package selection. To filter out the package control stanzas for a reduced distribution from a full distribution, I originally used a call to grep-dctrl but later replaced that by a custom python script called filter-packages.py. This script uses python-apt to filter Packages and Sources files for a certain package selection.

dist_graph.ml It soon became obvious that there were not many independent dependency cycle situation but just one big scc that would contain 96% of the packages that are involved in build dependency cycles. Therefor it made sense to write a program that does not iteratively build the dependency graph starting from a single package, but which builds a dependency graph for a whole archive.

Cycles I can now enumerate all cycles in the dependency graph. I covered the theoretical part in another blog post and wrote an email about the achievement to the list. Both resources contain more links to the respective sourcecode. The dependency graph generated for Debian Sid has 39486 vertices. It has only one central scc with 1027 vertices and only eight other scc with 2 to 7 vertices. All the other source and binary packages in the dependency graph for the archive are degenerate components of length one. Obtaining the attached result took 4 hours on my machine (Core i5 @ 2.53GHz). 1.5 h of that were needed to build the dependency graph, the other 2.5 hours were needed to run johnson's algorithm on the result. Memory consumption of the program was at about 700 MB. It is to my joy that apparently the runtime of the cycle finding algorithm for a whole Debian Sid repository as well as the memory requirements are within orders of magnitude that are justifiable when being run on off-the-shelf hardware. It must also be noted that nothing is optimized for performance yet. A list of all cycles in Debian Sid up to length 4 can be retrieved from this email. This cycle analysis assumes that only essential packages, build-essential and dependencies and debhelper are available. Debhelper is not an essential or build-essential package but 79% of the archive build-depends on it. The most interesting cycles are probably those of length 2 that need packages that they build themselves. Noticeable examples for these situations are vala, python, mlton, fpc, sbcl and ghc. Languages seem love to need themselves to be built.

Buildprofiles There is a long discussion of how to encode staged build dependency information in source packages. While the initial idea was to use Build-Depends-StageN fields, this solution would duplicate large parts of the Build-Depends field, which leads to bitrot as well as it is inflexible to possible other build "profiles". To remedy the situation it was proposed to use field names like Build-Depends[stage1 embedded] but this would also duplicate information and would break with the rfc822 format of package description files. A document maintained by Guillem Jover gives even more ideas and details. Internally, Patrick and me decided for another idea of Guillem Jover to annotate staged build dependencies. The format reads like:
Build-Depends: huge (>= 1.0) [i386 arm] <!embedded !bootstrap>, tiny
So each build profile would follow a dependency in <> "brackets" an have a similar format as architecture options. Patrick has a patch for dpkg that implements this functionality while I patched dose3.

Dropping Build-Depends-Indep and Build-Conflicts-Indep When representing the dependencies of a source package, dose3 concatenates its Build-Depends and Build-Depends-Indep dependency information. So up to now, a source package could only be compiled, if it manages to compile all of its binary packages including architecture:all packages. But when bootstrapping a new architecture, it should be sufficient to only build the architecture dependent packages and therefor to only build the build-arch target in debian/rules and not the build-indep target. Only considering the Build-Depends field and dismissing the Build-Depends-Indep field, reduced the main scc from 1027 vertices to 979 vertices. The amount of cycles up to length four reduced from 276 to 206. Especially the cycles containing gtk-doc-tools, doxygen, debiandoc-sgml and texlive-latex-base got much less. Patrick managed to add a Build-Depends-Indep field to four packages so far which reduced the scc further by 14 vertices down to 965 vertices. So besides staged build dependencies and cross building there is now a third method that can be applied to break dependency cycles: add Build-Depends-Indep information to them or update existing information. I submitted a list of packages that have a binary-indep and/or a build-indep target in their debian/rules to the list. I also submitted a patch for dose3 to be able to specify to ignore Build-Depends-Indep and Build-Conflicts-Indep information.

Dose3 crossbuilding So far I only looked at dependency situations in the native case. While the native case contains a huge scc of about 1000 packages, the dependency situation will be much nicer when cross building. But dose3 was so far not able to simulate cross building of source packages. I wrote a patch that implements this functionality and will allow me to write programs that help analyze the cross-situation as well.

Debconf Presentation Wookey was giving a talk at debconf12 for which I was supplying him with slides. The slides in their final version can be downloaded here

Future Patrick maintains a list of "weak" build dependencies. Those are dependencies that are very likely to be droppable in either a staged build or using Build-Depends-Indep. I must make use of this list to make it easier to find packages that can easily be removed of their dependencies. I will have to implement support for resolving the main scc using staged build dependencies. Since it is unlikely that Patrick will be fast enough in supplying me with modified packages, I will need to create myself a database of dummy packages. Another open task is to allow to analyze the crossbuilding dependency situation. What I'm currently more or less waiting on is the inclusion of my patches into dose3 as well as a decision on the buildprofile format. More people need to discuss about it until it can be included into tools as well as policy. Every maintainer of a package can help making bootstrapping easier by making sure that as many dependencies as possible are part of the Build-Depends-Indep field.

4 July 2012

Johannes Schauer: enumerating elementary circuits of a directed_graph

For my GSoC project this year I need to be able to enumerate all elementary circuits of a directed graph. My code is written in Ocaml but neither the ocamlgraph library nor graph libraries for other languages seem to implement a well tested algorithm for this task. In lack of such a well tested solution to the problem, I decided to implement a couple of different algorithms. Since it is unlikely that different algorithms yield the same wrong result, I can be certain enough that each individual algorithm is working correctly in case they all agree on a single solution. As a result I wrote a testsuite, containing an unholy mixture of Python, Ocaml, D and Java code which implements algorithms by D. B. Johnson, R. Tarjan, K. A. Hawick and H. A. James.

Algorithm by R. Tarjan The earliest algorithm that I included was published by R. Tarjan in 1973.
Enumeration of the elementary circuits of a directed graph
R. Tarjan, SIAM Journal on Computing, 2 (1973), pp. 211-216
http://dx.doi.org/10.1137/0202017
I implemented the pseudocode given in the paper using Python. The git repository can be found here: https://github.com/josch/cycles_tarjan

Algorithm by D. B. Johnson The algorithm by D. B. Johnson from 1975 improves on Tarjan's algorithm by its complexity.
Finding all the elementary circuits of a directed graph.
D. B. Johnson, SIAM Journal on Computing 4, no. 1, 77-84, 1975.
http://dx.doi.org/10.1137/0204007
In the worst case, Tarjan's algorithm has a time complexity of O(n e(c+1)) whereas Johnson's algorithm supposedly manages to stay in O((n+e)(c+1)) where n is the number of vertices, e is the number of edges and c is the number of cycles in the graph. I found two implementations of Johnson's algorithm. One was done by Frank Meyer and can be downloaded as a zip archive. The other was done by Pietro Abate and the code can be found in a blog entry which also points to a git repository. The implementation by Frank Meyer seemed to work flawlessly. I only had to add code so that a graph could be given via commandline. The git repository of my additions can be found here: https://github.com/josch/cycles_johnson_meyer Pietro Abate implemented an iterative and a functional version of Johnson's algorithm. It turned out that both yielded incorrect results as some cycles were missing from the output. A fixed version can be found in this git repository: https://github.com/josch/cycles_johnson_abate

Algorithm by K. A. Hawick and H. A. James The algorithm by K. A. Hawick and H. A. James from 2008 improves further on Johnson's algorithm and does away with its limitations.
Enumerating Circuits and Loops in Graphs with Self-Arcs and Multiple-Arcs.
Hawick and H.A. James, In Proceedings of FCS. 2008, 14-20
www.massey.ac.nz/~kahawick/cstn/013/cstn-013.pdf
In contrast to Johnson's algorithm, the algorithm by K. A. Hawick and H. A. James is able to handle graphs containing edges that start and end at the same vertex as well as multiple edges connecting the same two vertices. I do not need this functionality but add the code as additional verification. The paper posts extensive code snippets written in the D programming language. A full, working version with all pieces connected together can be found here: https://github.com/josch/cycles_hawick_james The algorithm was verified using example output given in the paper. The project README states how to reproduce it.

Input format All four codebases have been modified to produce executables that take the same commandline arguments which describes the graphs to investigate. The first argument is the number of vertices of the graph. Subsequent arguments are ordered pairs of comma separated vertices that make up the directed edges of the graph. Lets look at the following graph as an example: cycle example The DOT source for this graph is:
digraph G  
  0;
  1;
  2;
  0 -> 1;
  0 -> 2;
  1 -> 0;
  2 -> 0;
  2 -> 1;
   
To generate the list of elementary circuits using Tarjan's algorithm for the graph above, use:
$ python cycles.py 3 0,1 0,2 1,0 2,0 2,1
0 1
0 2
0 2 1
The commandline arguments are the exact same for the other three methods and yield the same result in the same order. If the DOT graph is in a format as simple as above, the following sed construct can be used to generate the commandline argument that represents the graph:
$ echo  sed -n -e '/^\s*[0-9]\+;$/p' graph.dot   wc -l   sed -n -e 's/^\s*\([0-9]\) -> \([0-9]\);$/\1,\2/p' graph.dot 

Testsuite As all four codebases take the same input format and have the same output format, it is now trivial to write a testsuite that compares the individual output of each algorithm for the same input and checks for differences. The code of the testsuite is available via this git repository: https://github.com/josch/cycle_test The other four repositories exist as submodules of the testsuite repository.
$ git clone git://github.com/josch/cycle_test.git
$ cd cycle_test
$ git submodule update --init
A testrun is done via calling:
$ ./test.sh 11
The argument to the shell script is an integer denoting the maximum number N of vertices for which graphs will be generated. The script will compile the Ocaml, Java and D sourcecode of the submodules as well as an ocaml script called rand_graph.ml which generates random graphs from v = 1..N vertices where N is given as a commandline argument. For each number of vertices n, e = 1..M number of edges are chosen where M is maximum number of edges given the number of vertices. For every combination of number of vertices v and number of edges e, a graph is randomly generated using Pack.Digraph.Rand.graph from the ocamlgraph library. Each of those generated graphs is checked for cycles and written to a file graph-v-e.dot if the graph contains a cycle. test.sh then loops over all generated dot files. It uses the above sed expression to convert each dot file to a commandline argument for each of the algorithms. The outputs of each algorithm are then compared with each other and only if they do not differ, does the loop continue. Otherwise the script returns with an exit code. The tested algorithms are the Python implementation of Tarjan's algorithm, the iterative and functional Ocaml implementation as well as the Java implementation of Johnson's algorithm and the D implementation of the algorithm by Hawick and James.

Future developments Running the testsuite with a maximum of 12 vertices takes about 33 minutes on a 2.53GHz Core2Duo and produces graphs with as much as 1.1 million cycles. It seems that all five implementations agree on the output for all 504 generated graphs that were used as input. If there should be another implementation of an algorithm that enumerates all elementary circuits of a directed graph, I would like to add it. There are some more papers that I would like to read but I lack access to epubs.siam.org and ieeexplore.ieee.org and would have to buy them. Benchmarks seem a bit pointless as not only the algorithms are very different from each other (and there are many ways to tweak each of them) but also the programming languages differ. Though for the curious kind, it follows the time each algorithm takes to enumerate all cycles for all generated graphs up to 11 vertices.
algorithmtime (s)
Johnson, Abate, Ocaml, iterative137
Johnson, Abate, Ocaml, functional139
Tarjan, Python153
Hawick, D175
Johnson, Meyer, Java357
The iterative Ocaml code performs as well as the functional one. In practice, the iterative code should probably be preferred as the functional code is not tail recursive. On the other hand it is also unlikely that cycles ever grow big enough to make a difference in the used stack space. The Python implementation executes surprisingly fast, given that Tarjan's algorithm is supposedly inferior to Johnson's and given that Python is interpreted but the Python implementation is also the most simple one with the least amount of required datastructures. The D code potentially suffers from the bigger datastructures and other bookkeeping that is required to support multi and self arcs. The java code implements a whole graph library which might explain some of its slowness.

2 July 2012

Johannes Schauer: port bootstrap build-ordering tool report 3

A copy of this post is sent to soc-coordination@lists.alioth.debian.org as well as to debian-bootstrap@lists.mister-muffin.de.

Diary

June 18 Pietro suggests a faster way to generate installation sets for a list of packages. In my case, I need an installation set for every source package in the archive to find out how often a binary package is needed to build a source package. As a result, the speed is doubled in contrast to the original approach.

June 19
  • adapt code to work with new dose release 3.0
  • remove unneeded parts of code
  • add different possibilities to find amount of source packages that need a binary package
  • add code to get multiple installation sets using Depsolver_int.solve

June 20
  • add ~global_constraints:false to Depsolver.listcheck, Depsolver.trim and Depsolver.edos_install calls
  • adapt output graph to limited xdot capabilities

June 21 I formulate an email to the list, reporting of dependency graphs of debhelper, cdbs, pkg-config and libgtk2.0-dev. My current technique gets an installation set for a source package, removes all those that are already installable and adds the others as a dependency of that source package. This dependency will include an installation set of that binary as well minus all packages that are already available. The problem with that approach are dependency cycles created by long dependency chains. Example: src:A needs B needs C needs A. B and C would both be added as a dependency of src:A. B as well as C would also include their installation set which in both cases includes A. So now there are two cycles: src:A->B->A and src:A->C->A. For a real life example, look at the following situation of cdbs and src:sqlite3. cdbs old situation It is created because src:sqlite3 needs cdbs needs python-scour needs python needs python2.7 needs libsqlite3-0. Therfor libsqlite3-0 is in the installation set of cdbs, python-scour, python and python2.7. This creates five cycles in the graph even though there is only one. It would be better to reduce the dependencies added to src:sqlite3 to its direct dependency which is cdbs. Package dependencies are disjunctions from which the solver chooses one or the other to build an installation set. To solve the problem above I would need to know which disjunction the solver chose and then only add the direct dependency of a package to the dependency graph.
  • improve build_compile_rounds performance
  • big overhaul of menu structure
  • fix subgraph extraction code

June 22
  • do not create a universe if not needed - use hashtables instead
  • for sorting packages, generating difference of package sets and otherwise comparing packages, always use CudfAdd.compare
  • as a custom list membership function, use List.exists instead of trying List.find
  • more speedup for build_compile_rounds
  • the number of source packages that can be built does NOT include the cross built packages
  • print closure members in graph
  • refactor code and move common functions to bootstrapCommon.ml
  • add breakcycles.ml for future code to break cycles using staged build dependencies
  • use more extlib functionality
  • extended package list input format

June 23 After several emails with Pietro I learn about syntactic dependency graphs. To document my progress and prove my efforts I committed the code as commit 6684c13. But this code was soon found out to be unecessary so it will be removed later and just serves as documentation.

June 24 I came up with another (better?) solution to get the chosen disjunctions. It simply uses the calculated installation set to decide for each disjunction which one was taken by the solver. I reported that important step and the open questions involved with it in an email to the list. The problem always was, that an installation set can easily contain more than one package of a disjunction. In this case it is not clear which branch was chosen by the solver. I found, that in Ubuntu Natty there are only 6 such packages and for each of them the situation can be solved. It can be solved because in all of those cases it is that either one package of a disjunction provides the other or that both packages depend upon each other, which means that both have to be included.

June 27
  • use installation set to flatten build dependencies of source packages
  • refactor code and move common functions to bootstrapCommon.ml

June 25 I have to have an algorithm that finds all circuits in a given graph. This is necessary so that:
  1. cycles can be enumerated for huge dependency graphs where cycles are hard to see
  2. cycles can be enumerated to find a cycle that can be broken using staged build dependencies
It seems that Johnson's algorithm is the best way to do this complexity wise and Pietro already blogged about the problem together with an implementation of the algorithm in ocaml. Unfortunately it turns out that his code doesnt implement the algorithm correctly and hence misses out on some cycles. The fix seems not to be too trivial so I'm still investigating it.

June 28
  • add crosseverything.ml to obtain a list of source packages that, if cross compiled, would make the whole archive available

Results While the first week was productive as usual, I had to work some time on a University project during the second week as well as attend a family meeting. I will catch up with the lost time over the course of the next week.

dose3 Using dose 3.0 (which fixes a bug about essential packages) the output of the algorithms is now likely less wrong then before.

performance Performance was improved in the generation of installation sets as well as in the code that tries out how many packages can be built in multiple rounds. This was achieved by more caching, less unnecessary operations in looping constructs, replacing lists with hashtables, not creating universes where not necessary.

user interface The main program, basenocycles.ml now has a much better menu structure.

input format The programs take two package file inputs. The list of source packages that has to be cross built for a minimal build system and the list of source packages that was chosen to be cross compiled in addition to that. Both files list one source package per line and now allow comments.

refactoring As more and more scripts are added, more and more functionality is moved to bootstrapCommon.ml which makes each script much cleaner.

what to test for cross building As discussed in the "Future" section of the last report, I now automated the process of finding out which packages, if they were cross compiled, would make the whole archive available because they break all cycles and allow native compilation of the rest. The outcome: to build 3333 out of 3339 packages in natty natively, at most 186 source packages must be cross compiled. The other six cannot be compiled because of version mismatches in the Natty Sources.bz2 and Packages.bz2. The code can be run from crosseverything.ml.

limit source dependencies to direct dependencies Reducing the dependencies of source packages from their full installation set to their direct dependencies by finding out which disjunction of their dependency list were taken, greatly simplifies the dependency graphs. The dependency graph created for libgtk2.0-dev could be reduced from 491 to 247 vertices for a depth of three. For cdbs it is now clearly visible that cdbs depends on libsqlite3-0 which builds from src:sqlite3 which depends on cdbs. Before: cdbs old situation After: cdbs new situation For pkg-config the graph also has been reduced to the one single cycle that matters: src:pkg-config needs libglib2.0-dev which depends on pkg-config which builds from src:pkg-config. Before: pkg-config old situation After: pkg-config old situation

Future I will prepare content for wookey's debconf talk on crossbuilding and bootstrapping. As this will include directions how to use the current code, I will kill two birds with one stone and write some proper documentation for my current source. The following two lists will be displayed after a dependency graph is calculated and reduced to its scc:
  • those source packages that have the least build dependencies not fulfilled. Those might be candidates for easy staged build dependencies. Since the source package is part of the scc, it will definitely be involved in some cycle somewhere.
  • those binary packages that most source packages depend upon. Those could be candidates for cross compilation as it might be easier to cross compile the source package than using staged build dependencies.
Patrick managed to cross build packages with sbuild now. So the list of packages that crosseverything.ml produces can now be checked efficiently for cross buildability. With this list, potentially more cycles can be broken out of the box. A feature will be added that allows the user to remove all packages from a dependency graph that can be cross compiled without any additional effort. Version mismatches between source and binary packages in Sources.bz2 and Packages.bz2 respectively in Ubuntu make the scripts fail and/or produce wrong results. Debian (even Sid) doesnt have this problem so I should find out where to report this problem to Ubuntu. I need to write a working version of Johnson's algorithm because much functionality depends upon it. I have the option to improve Pietro's version or write one from scratch. Writing one from scratch might be easier as I have Pietro's code as template as well as a Java implementation of Johnson's algorithm which seems to work. The following functionalities need working cycle enumeration:
  • given source packages with staged build dependencies, an enumeration of cycles is needed to find out which cycles can be broken by building packages staged. It makes less sense to blindly build a package stage and then check if this makes more packages available.
  • display cycles of a dependency graph to the user. After obtaining all cycles in the graph it makes sense to sort them by their length. The user would then investigate the situation of the smallest cycles first. This makes sense because breaking small cycles can potentially break bigger cycles. Since in the end, all cycles have to be eliminated anyway, it makes sense for the user to first tackle the small ones.
  • display the feedback arc set to the user. The packages in the feedback arc set might be very good candidates for reduced build dependencies or cross compilation.

4 June 2012

John Sullivan: To nook

My curiosity was definitely nookd by this story. While this doesn't seem to have been an act of censorship or anything nefarious by Barnes and Noble itself, this does point to just how terrifying ebooks can be if we aren't careful. In the future, when we are dealing with the entirety of human knowledge, we better be careful with the find and replace function. And we better make sure we control our computer systems so that other people can't nook us without our permission. See also: As someone who read War and Peace in paper form, much of it in bed, I definitely empathized with this:
Some weeks ago I decided that I wanted to read Tolstoy's War and Peace. Lou Ann loaned me her copy. At more than 1100 pages, reading it in bed required as much strength as balancing a box of bricks in my hands. In my senior years I have developed arthritis in my thumbs, which made the effort not only difficult, but painful.

9 April 2012

John Sullivan: Photos from Belgium

I put up (with almost no filtration) the photos I took while visiting Brussels for FOSDEM in February. Most are from the Atomium.

5 March 2012

John Sullivan: Plus One

In mailing list discussions, I've noticed that the practice of replying "+1" to messages with which one agrees has become very widespread. I think this is interesting for a lot of reasons, such as what it shows about habits picked up in communicating via social network sites that grew out of e-mail percolating back to e-mail. We seem to have really grown to like this idea of "voting" for people's messages, and if there's no "Like" or "+1" button on their message, we'll make our own. (Likewise, apparently a person's name is no longer sufficient to indicate that one is addressing them in an e-mail -- am I right, @dear_reader?) The problem for me is that I have my venerable awesome e-mail client Gnus set to hide quoted text in messages by default with (add-hook 'gnus-article-prepare-hook (lambda () (gnus-article-hide-citation 1))), and it thinks that "+" is the start of a quoted line -- so it hides the "+1". While there's a certain poetic justice to that, it confuses me because it makes the message look empty, and someone could do something like "+1 I'm pregnant too", causing me to miss some very important news. The fix for this is to change the regexp pattern Gnus uses to decide if a line is a message quote or not. gnus-message-cite-prefix-regexp is built using message-cite-prefix-regexp, so I changed the value of the latter, removing the literal "+". I also found while investigating this that I already configure that variable in order to add "#>" which someone sometime during my 9 years of using Gnus must have used. The net result is (setq message-cite-prefix-regexp "\\([ ]*[-_.#[:word:]]+>+\\ [ ]*[]> ]\\)+"). (Note that some of those whitespace characters are tabs, which probably won't display properly here, but you can compare to the default value to see what I actually changed). +1?

18 February 2012

Asheesh Laroia: Help a BSD developer bike across the US, and give hope to cancer communities

<style type="text/css"> @import "/pub/special-css/venk.css"; </style>
'Cancer' is a cluster of diseases, a betrayal by the majesty and power of the development program that constructed and heals us.
Support Venkatesh's bike ride, and alleviate the toll of cancer.
My friend Venkatesh, pictured above, is going to bike four thousand miles, all the way across the continental US, from Baltimore to Portland. He's doing it to raise money for the Ulman Cancer Fund for Young Adults. I'm writing this because I want you to donate money to his cause. He's a DragonFly BSD developer, loves bikes, and your donation could make a big difference. I first met Venkatesh through the Johns Hopkins computer club, an ACM chapter. I was the head of the club, and he had just started his career at Hopkins. He was looking for advice on running Brickwiki, the LEGO encyclopedia. Quickly, I became his friend; in that time, I've learned the following things about him. In the years since I graduated from Hopkins, I've been impressed by Venkatesh's ongoing curiosity and contributions to open source projects like DragonFly. I'm honored to have this chance to help him bike across the country for a good cause. Here is a quick word about the 4K for cancer effort:
Since 2002, groups of college students have undertaken a 70 day, 4000+ mile summer bike ride across the United States with the goal of offering hope, inspiration and support to cancer communities along the way. This past summer was our 10th year of cycling across the country as 76 volunteers rode along three different routes: Baltimore to San Francisco, Baltimore to Portland, and Baltimore to Seattle. Our riders raised a combined $476,000 to support organizations and individuals in the fight against cancer.
His fundraising goal is $5,000. Anything from $5 to $500 is a donation to an organization that helps young adult cancer surviers and their families get access to information and support resources. Can you help?

18 January 2012

John Sullivan: Sent to the ACLU today

I was on the brink of mindlessly clicking through the ACLU action center as usual to send an email opposing SOPA. But then I read their boilerplate text, and ended up cancelling the letter to my rep and instead sending this quick note to the ACLU:
Your SOPA suggested letter text supports current copyright law, and also backhandedly supports PIPA (the Senate version of the bill). This is far too weak of a position. As a donor, I ask you to take a stronger position that current copyright law unjustly restricts free speech, and that no further enforcement measures should be instituted until that fundamental problem is addressed. At least take on BOTH of these bills strongly. Most of the significant Internet is blacked out today to oppose both bills -- why would you cede so much ground to copyright maximalists? We have the support to oppose and defeat both bills.
For reference, here was their text:
While I believe it's important to protect copyrighted material online, the language of the Stop Online Piracy Act (SOPA) is flawed and will lead to the blocking of lawful content. Unlike the Senate version of the bill, SOPA eliminates the concept of sites 'dedicated to infringing activity' and enables law enforcement to target all sites that contain some infringing content -- no matter how trivial. The potential for impact on non-infringing content is much greater under SOPA than under other versions of this bill. Sites with user-generated content, like YouTube, Twitter, and Facebook, would be especially vulnerable, as one small piece of infringing content could lead to blocking the entire site. Even though proposed changes would narrow the amount of lawful content impacted, the changes don't go far enough. It is still likely that search engines will end up blocking access to perfectly legal online content. Congress should focus not just on the goal of protecting copyright owners, but also protecting the speech rights of consumers and providers who are reading and producing wholly non-infringing content. Congress must eliminate the collateral damage to protected non-infringing content. Only in that way will Congress truly achieve its goal of protecting authors while respecting the constitutional right to free speech.
Maybe I'm overreacting, but I dislike it when good organizations take weak positions unnecessarily. Usually this is not a problem with the ACLU, for me. It doesn't help that I keep seeing this meme everywhere in the anti-SOPA/PIPA conversation: "I agree we need to do something about piracy, but not this..." I don't think we need to do anything to fix violations of an extraordinarily unjust law until the law itself is fixed. I don't find that to be a very radical position.

15 January 2012

John Sullivan: At FOSDEM in February

I will be helping to represent the FSF at FOSDEM next month in Brussels. I'm speaking in the Legal Issues Devroom on Saturday 2012-02-04. The presentation is called "Is copyleft being framed?":
This short talk will address the following questions, to inspire discussion and contemplation about how we frame descriptions of the state of licensing in free software.
  • Numbers are increasingly being cited to show that the use of copyleft licenses, specifically the GPL, is declining. What do these numbers actually show, who is propagating them, and why? What do or might other numbers show?
  • Is the "percentage of free software projects which use copyleft licenses" a useful way to judge the success of copyleft? Does an increase in the percentage of projects using non-copyleft permissive licenses indicate a failure of copyleft?
  • As a small related case study, what role have the licensing terms of popular mobile application stores played in this debate, and how have those terms changed the frame of the discussion?
Let me know if you'll be there too!

19 December 2011

John Sullivan: Where Shall I Wander

Google Maps is now mapping the indoors. I saw an ad for this while passing through MSP yesterday (given how much time I spend on the Internet, it's strange and a little embarrassing to learn about new things on the Internet from airport billboards), since one of their initial targets is the infamous Mall of America.
Detailed floor plans automatically appear when you re viewing the map and zoomed in on a building where indoor map data is available. The familiar blue dot icon indicates your location within several meters, and when you move up or down a level in a building with multiple floors, the interface will automatically update to display which floor you re on. All this is achieved by using an approach similar to that of My Location for outdoor spaces, but fine tuned for indoors.
Thoughts about this: Title from John Ashbery

30 November 2011

Russell Coker: Links November 2011

Forbes has an interesting article about crowd-sourcing by criminals and law enforcement [1]. Ulissescastr0 made a Youtube video showing how to install SE Linux on Debian/Etch [2]. Probably no-one is using Etch nowadays so this video is outdated, but it s a good way of teaching people. It would be good if someone made a similar video showing how to do SE Linux things on Squeeze. I discovered the above SE Linux video through Explow which provides a neat interface to multiple searches and information sources [3]. I don t think I will be using Explow much in future as I could get the same result through Google video search. They also have a news portal but there are other sites for that. But it does seem that Explow would be useful for newbies. Eric Michael Johnson wrote an interesting article about the inherent bias in Psychological research based in the US [4]. People who live in urban environments think differently in some ways to people who live in different environments or who have different lifestyles. Therefore generalising from university students in the US to the entire human race is likely to get incorrect results. This is something to consider the next time you are tempted to generalise to the wider population from your own friends, colleagues, etc. The Daily Kos has a scary article about the TSA having a woman detained for reciting part of the US constitution [5]. The US will remain on my list of countries to avoid for the forseeable future. Vorlon has written an informative article about the use of hardening options when building Debian packages [6]. It s now even easier to do this, so every package that simultaneously deals with data of differing levels of integrity or sensitivity should be built this way. Bunker Roy gave an interesting TED talk about his Barefoot College that teaches useful skills to people in rural parts of India who don t have a traditional school education [7]. His talk really shows up some of the arrogance in the people who run traditional education. Justin Hall-Tipping gave an interesting TED talk about ways of solving the world energy problems [8]. He started with explaining the problems and why they need to be urgently solved and then described in detail some of the research that his group has done to solve the problems. This includes flexible photo-voltaic cells, infra-red vision to save on lighting, and a way of using carbon nano-tubes to control the thermal properties of windows.

Next.

Previous.